🎮 Reinforcement Learning - DefB · Scour

Show HN: Fighting the War Against Expensive Reinforcement Learning

cadenza-landing-qtu7gbjwb-akshparekh123-3457s-projects.vercel.app·18h·

Discuss: Hacker News

Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards

arxiv.org·20h

Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling

arxiv.org·20h

🧠Machine Learning

A multi-agent reinforcement learning approach to autonomous aircraft taxiing with taxiing time, fuel consumption, and emission optimization

sciencedirect.com·1d

check out this article on Reinforcement Learning with R: Origins, Real-Life Applications, and Practical Implementation

dev.to·2d·

Discuss: DEV

Optimal timing for superintelligence

marginalrevolution.com·1h

Truth and paradox in the theory of finite and infinite games, Owens Memorial Lecture, Wayne State University, April 2026

jdh.hamkins.org·22h

λFunctional Programming

BetaZero V2: A Diffusion Model for Setting Boulder Problems

evmojo37.substack.com·2h·

Discuss: Substack

📊Data Science

A Conceptual Framework for Exploration Hacking

lesswrong.com·9h

λFunctional Programming

How to Leverage Explainable AI for Better Business Decisions

towardsdatascience.com·10h

Feedback Control for Computer Systems

janert.org·17h

Optimizing post-disaster road restoration with reinforcement learning: A traveler-behavior-aware approach

sciencedirect.com·9h

Artificial Intelligence and the Passivity Problem

psychologytoday.com·7h

Observe emergent behavior in autonomous multi-agent LLM networks

agents.glide2.app·2d·

Discuss: Hacker News

Entropic Balance with Feedback Control: Information Equalities and Tight Inequalities

link.aps.org·2d

v6 (Code 2 here) — Most complete architecture. This version is faster than my old v5, statistically correct, has all the advanced psychology/network features, and produces stunning visualizations

gist.github.com·6h·

Discuss: r/C_Programming

📊Data Science

Show HN: A minimal online decision maker

decisionmaker.online·1d·

Discuss: Hacker News

For real game-theoretic reasoning, we need best response in imperfect information games

weyxie.bearblog.dev·3d·

Discuss: Hacker News

New technology in programming and poker

natemeyvis.com·3h

Show HN: I taught AI to remember. Then it warned me

github.com·48m·

Discuss: Hacker News

Loading more...